Client Report - Elliot

Unit 1 Stretch

Author

Ezekial Curran

Background

Through history names at one point or another were considered new or novel. Overtime certain names came into a pool of commonly used names that parents would then pull from rather than coming up with a new one.

With the rise of Christianity, certain trends in naming practices manifested. Christians were encouraged to name their children after saints and martyrs of the church. These early Christian names can be found in many cultures today, in various forms. These were spread by early missionaries throughout the Mediterranean basin and Europe.

By the Middle Ages, the Christian influence on naming practices was pervasive. Each culture had its pool of names, which were a combination of native names and early Christian names that had been in the language long enough to be considered native. Now the question isn’t just what are native names but what are the greatest influences of parents naming their children? ref

The data is stored in a CSV file that gives the number of times a specific name was given to a child in a specific year.

Imports and Functions

Show the code
import pandas as pd
import polars as pl
import numpy as np
from lets_plot import *

LetsPlot.setup_html(isolated_frame=True)
Show the code
# Data Dictionary: https://github.com/byuidatascience/data4names/blob/master/data.md
# df_url = pd.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv")
df = pd.read_csv("names_year.csv")

# df_url_pl = pl.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv", infer_schema_length=100000)
df_pl = pl.read_csv("names_year.csv", infer_schema_length=100000)

Question 1

  1. Create a chart using the data on the use of the name Elliot. Compare its use with the releases of the movie E.T.
Show the code
# Pandas
print(f"Creating the graph using a Pandas DataFrame\n")
df_ell = df.query('name == ["Elliot"]')
ell = (
    ggplot(data=df_ell)
    + geom_line(mapping = aes(x = 'year', y = 'Total', color='name'))
    + scale_color_manual(values={'Elliot': 'blue'})
    + scale_x_continuous(limits=[1950,2025], format = 'd', expand=[0,0])
    + scale_y_continuous(format='d')
    + labs(
        title="Elliot... What?"
    )
    + theme(panel_grid=element_line(linetype=1, color = "white"),
        panel_background=element_rect(fill="#e5ecf6", linetype=0),
        axis_line_x=element_blank(),
        axis_ticks_x=element_blank()
        )
    + geom_text(x = 1982, y = 1300, label="E.T. Released", hjust="right", size=6)
    + geom_text(x = 1985, y = 1300, label="Second Released", hjust="left", size=6)
    + geom_text(x = 2002, y = 1300, label="Third Released", hjust="left", size=6)
    + geom_vline(xintercept=1982, color="red", linetype="dashed")
    + geom_vline(xintercept=1985, color="red", linetype="dashed")
    + geom_vline(xintercept=2002, color="red", linetype="dashed")
)
display(ell)
ggsave(ell, filename="elliot_pandas.svg", path="./plots/")
# Polars
print(f"\nUsing a Polars DataFrame")
df_ell_pl = df_pl.filter(pl.col('name').is_in(["Elliot"]))
ell_pl = (
    ggplot(data=df_ell_pl)
    + geom_line(mapping = aes(x = 'year', y = 'Total', color='name'))
    + scale_color_manual(values={'Elliot': 'blue'})
    + scale_x_continuous(limits=[1950,2025], format = 'd', expand=[0, 0])
    + scale_y_continuous(format='d')
    + labs(
        title="Elliot... What?"
    )
    + theme(panel_grid=element_line(linetype=1, color = "white"),
        panel_background=element_rect(fill="#e5ecf6", linetype=0),
        axis_line_x=element_blank(),
        axis_ticks_x=element_blank()
    )
    + geom_text(x = 1982, y = 1300, label="E.T. Released", hjust="right", size=6)
    + geom_text(x = 1985, y = 1300, label="Second Released", hjust="left", size=6)
    + geom_text(x = 2002, y = 1300, label="Third Released", hjust="left", size=6)
    + geom_vline(xintercept=1982, color="red", linetype="dashed")
    + geom_vline(xintercept=1985, color="red", linetype="dashed")
    + geom_vline(xintercept=2002, color="red", linetype="dashed")
)
display(ell_pl)
# the sv is simply to suppress the output
sv = ggsave(ell_pl, filename="elliot_polars.svg", path="./plots/")
Creating the graph using a Pandas DataFrame

Using a Polars DataFrame